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Abstract 

Hairpin completion is an abstract operation modeling a DNA bio- 
operation which receives as input a DNA strand w — xaya, and outputs 
w' — xayax, where x denotes the Watson-Crick complement of x. In 
this paper, we focus on the problem of finding conditions under which the 
iterated hairpin completion of a given word is regular. According to the 
numbers of words a and a that initiate hairpin completion and how they 
are scattered, we classify the set of all words w. For some basic classes of 
words w containing small numbers of occurrences of a and a, we prove 
that the iterated hairpin completion of w is regular. For other classes 
with higher numbers of occurrences of a and a, we prove a necessary and 
sufficient condition for the iterated hairpin completion of a word in these 
classes to be regular. 

1 Introduction 

A DNA strand can be abstractly viewed as a word over the alphabet {A, C, G, T}, 
where in A is Watson- Crick complementary to T and C to G, and two complemen- 
tary DNA single strands of opposite orientation bind together to form a double 
DNA strand (intermolecular structure). Also, if subwords of a DNA strand are 
complementary, the strand may bind to itself forming intramolecular structures 
such as stem-loops, also known more commonly as hairpins (Figure Q] (2)). Hair- 
pins can be a building block of a larger-scale structure of RNA strands, and play 
a role in determining various chemical and thermodynamical properties (stabil- 
ity, structures, functions) of the structure, and make significant contributions to 
the genetic information processing as illustrated in their function as a stopper 
for messenger RNA (mRNA) transcription. A CG-rich sequence of an mRNA 
folds into its Watson-Crick complement on the RNA and forms a stable hairpin. 
Transcription of the mRNA is terminated when RNA polymerase reaches the 
hairpin. At that time, nusA protein bound to the polymerase interacts with 
the hairpin and takes the polymerase off the mRNA. This hairpin-driven mech- 
anism is called intrinsic termination [23 . As such, hairpins tend to interfere 
with reactions, and therefore were given the cold shoulder by DNA computing 
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(3) denaturation 



(2) extension 

Figure 1: Hairpin completion by polymerase chain reaction [3 [20]. The oper- 
ation input is xaya, the output is xayoix, and the primer is a. 



experimentalists. See [TJ [2] |9j [TUJ [T2] [19] about this problem and about some of 
the "good" designs of DNA strands that are free of hairpins. 

Hairpin is not a foe to all DNA computing experiments; many molecular 
computing machineries have been proposed which make good use of hairpins. 
Such hairpin-driven systems include DNA RAM [TQ[nj[22] and Whiplash PCR 
[20]. In particular, Whiplash PCR features a self-directed polymerase chain 
reaction (PCR) of DNA strand, which practically motivates the investigation 
of a formal language operation called hairpin completion. Hairpin completion 
proceeds as follows (Figure [1]): Starting from a DNA strand w — xaya, a 
segment a at the 3'-end of w binds to its Watson-Crick complementary strand 
a on the strand (annealing). A polymerase chain reaction then extends w at 
its 3 '-end in the 5' — > 3' direction so as to generate the strand xayax (let us 
call a and a that bind with each other to initiate this PCR reaction primers). 
Despite the intrinsic 5' — > 3' polarity of polymerases, a mechanism exists to 
make polymerase reaction work in the 3' — > 5' direction (Okazaki fragment 

csd- 

As an abstract model of the above-mentioned self-directed PCR, Cheptea, 
Martin- Vide, and Mitrana proposed the hairpin completion in [3] , and since then 
this abstract operation has been studied on its algorithmic and formal linguistic 
aspects [51 ll5l[To1ll7] together with its variant called bounded hairpin completion 
[HUH], where the length of extension in one operation is bounded by a constant. 
Ito et al. [8] and Kopecki [14] proved that all classes in the Chomsky Hierarchy 
are closed under iterated bounded hairpin completion. In contrast, the class 
of regular languages was proved not to be closed under iterated (unbounded) 
hairpin completion [3] , and a surprising fact is that iterated hairpin completion 
of a word can be non-regular [TJ . In this paper, we focus on a problem proposed 
by Kopecki in [Tl]; is it decidable whether the iterated hairpin completion of a 
given word is regular? The iterated hairpin completion of a singleton language 
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(a word) is known to be in NL [3J, but can be non- regular as shown in the 
following example. 

Example 1. Let a — a k and w = abacaada, where a, a, b, b, c, c, d, d are all dis- 
tinct letters. Then the intersection of the iterated hairpin completion of w with 
(abac(ab) + ad) 2 abacaada.(ba) + cabct is {(abac(abyad) 2 abacaada(ba) l caba | 

1 > 1}. This intersection is not context-free, and neither is the iterated hairpin 
completion. 

In this paper, we give a partial answer to the regularity-test decidability 
problem. We focus our attention on the number of primers a given word contains 
as its factors and on how these primers are scattered over the given word. All the 
words are classified in accordance with these two criteria, and for some basic 
classes, we give a necessary and sufficient condition for the iterated hairpin 
completion of a word in the class to be regular. 

2 Preliminaries 

Let E be an alphabet, E* be the set of all words over S, and for an integer k > 0, 
E fc be the set of all words of length k over E. The word of length is called the 
empty word, denoted by A, and let E + — E* \ {A}. A subset of E* is called a 
language over E. For a word w G E*, we employ the notation w when we mean 
the word as well as the singleton language {w} unless confusion arises. For a 
language L C E*, we denote by L* the set {wi ■ ■ ■ w n \ n > 0, W\, . . . , w n 6 L}. 

We equip E with a function : E — > E satisfying Va G E,a — a; such 
a function is called an involution. This involution ~ is naturally extended to 
words as: for oi, d2, • • • , a n G E, a\02 • ■ ■ a n = ■ ■ • aa aT- For example, over 
the 4-letter alphabet A = {A,C,G,T}, if we define an involution d : A — >• A as 
d(k) = T and d(C) = G, then d, being thus extended, maps the Watson strand 
of a complete DNA double strand into its Crick strand. The involution d is 
called the Watson- Crick involution [T3J. For a word w G E*, we call W the 
complement of w, being inspired by this application. A word w G E* is called a 
pseudo-palindrome if w =W. For a language L C E*, L — {w | w € L}. 

For words u,w € S*, if ro = xuy holds for some words x,y G E*, then u is 
called a factor of w; a factor that is distinct from w is said to be proper. If the 
equation holds with x — A (y = A), then the factor u is especially called a prefix 
(resp. a suffix) of w. The prefix relation can be regarded as a partial order < p 
over E*;m< p m) means that u is a prefix of w. Analogously, by w > s v we mean 
that v is a suffix of w. For a word w G E* and a language L C E*, a factor u of 
w is minimal with respect to L if t> G L and none of the proper factors of v is in 
L. 

A nonempty word w G E + is primitive if w = x l implies i — 1 for any 
nonempty word x G E + . It is well- known that for any nonempty word w, there 
exists a unique primitive word u with w G u + . Such u is called the primitive 
root of w and denoted by p(w). Two words a;, y G E* commute if a;y = ys, and 
this is known to be equivalent to p(x) — p(y). See (J for details of primitivity 
and commutativity of words and related results. 
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Now we introduce the operation investigated in this paper, that is, hairpin 
completion, and define it formally. Imagine that we have a DNA sequence 
5' - CAATCGTATGAT - 3'. The suffix GAT can find its <i-image as a factor ATC on 
this sequence. Hence, this DNA sequence may bend over into a hairpin form by 
GAT binding with ATC. This formation of hairpin structure leaves CA as a free 
sticky-end, and DNA polymerase converts it into the complete double strand by 
extending its 3 '-end by TG = <i(CA). This exemplifies the mechanism of hairpin 
completion. We call two words whose thus binding initiate hairpin completion 
primers. In the above example, GAT and ATC are primers. 

Let k be a constant that is assumed to be the length of a primer. Throughout 
this paper, we will not use the notation 'fc' for any other purpose. Let a G S fc 
be a primer. If a given word w G S* has a factorization uava for some u, v G S* 
and a G S fe , then its right hairpin completion with respect to a results in the 
word uavau. As long as a is clear from context, this operation is simply called 
(single-primer) right hairpin completion. By w -+-RH a w ' \ or by w — > ra w', we 
mean that w' can be obtained from w by right hairpin completion (with respect 
to a). The left hairpin completion is defined analogously as an operation to 
derive u'av'au' from av'au', and the relation -^cH a is naturally introduced. 
By — >*£n and — >^ , we denote the reflexive transitive closure of — >ch and that 
of -*~kh, respectively. The relation — >u is defined as the union of -^ch and 

For a given language L C £*, we define the set of words obtained by left 
hairpin completion from L, and the set of words obtained by iterated left hairpin 
completion from L, respectively, as follows: 

£H a (L) = {w' | 3w G L,w^ ma w'}, CH* a {L) = {w' \3wGL,w w'}. 

Analogously, lZH a (L) and TZH^(L) are defined based on -^>-rh an d -^tzh., and 
T-L a {L) and W a (V) are defined based on and -^* H 

Proposition 1. For a word w G £*, lZHl(w) = CH* k {w). 

3 Word structures relevant to the power of it- 
erated hairpin completion 

In this section, we describe several structural properties of a word w that will 
be relevant for the characterization of its iterated hairpin completion %*(to), 
where a G S fe is a fixed parameter. 

A word ii £ E* is called an a-preftx of a word w G S* if w = uax for 
some word x G S*. In a similar manner, a word v G S* is an a-sufjix of w if 
w = yav for some y G E*. If w = yav begins with a, then this prefix can bind 
with the occurrence of a (unless they overlap with each other), and left hairpin 
completion results in vw. By Pref a (w) and Suff^ui), we denote the set of all a- 
prcfixes and that of all a-suffixes of w, respectively. One can easily observe that 
Suff— (w) = Pref Q (w). Throughout this paper, we let Pref Q (w) = {ui, . . . ,u m } 
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and Suffa(w) = {vj, . . . , v^} for some m, n > 0. It will be convenient to assume 
that these a-prefixes are sorted in the ascending order of their length. Likewise, 
we assume that \vj\ < \v^\ < • • • < \u^\. 

Our investigation on the properties of a-prefix and a-suffix of word begins 
with a basic observation. 

Proposition 2. For a word w 6 aS*, the following statements hold: 

1. for any u G Pref a (io), a < p ua; 

2. for any x\, . . . ,x n £ Pref a (w), a < p x\ ■ ■ ■ x n a; 

Proof. The first statement derives directly from the definition of a-prefix. For 
the second one, induction on n works. Due to the first statement, a < p x n a so 
that proving a < p x\ ■ ■ ■ x n -±x n a is reduced to proving a < p x% ■ ■ ■ x n -%a. □ 

From this proposition, we can easily deduce that for a word w G S*a and 
yi, ...,yi S Suffa(w), ayT-'-W >s a, which means a < p y t ---yia. This 
deepens the above observation further as follows. 

Corollary 1. For a wordw £ a£*n£*a ; any word in (Prefo,(w) USuffa(w))*o; 
has a as its prefix. 

Due to the second statement of Proposition [5J a < p x\u < p X1X2C1 < p 
■ ■ ■ < p x\X2 ■ ■ ■ x s a holds for a-prefixes x\, . . . , x s G Pref Q (w). Hence, from a 
word x±X2 ■ ■ -Xsaw'a, one-step right hairpin completion can produce at least 
the words X1X2 ■ ■ ■ x s aw'a{\, xT, x\X2, ■ ■ ■ , x\X2 ■ ■ ■ x s }0 Now, if we know that 
one-step hairpin completion extends the word to the right by u, what can we 
say about the word ul Firstly, as long as |u| < |a;i • • ■ x s |, we can say that 
ua < p X\ ■ • ■ x s a by definition of hairpin completion. Moreover, Corollary [1] 
enables us to find < i < s such that \xi ■ ■ ■ Xi\ < \u\ < \x\ ■ ■ ■ Xi+\ \ . Then, one 
can let u = x\ ■ ■ ■ XiZ for some prefix z of 3Gj.fi. Since za < p Xi+\a < p w, z is 
an a-prefix of w that is properly shorter than ac,_fi. By defining md(x i+ i) to be 
the index satisfying u ind ( x . +1 ) = x i+u we have z G {u x , u ind ( :( .. +1 )_i}; recall 
that elements of Pref Q (w) is sorted with respect to their length. The above 
argument is summarized by the next lemma. 

Lemma 1. Let xi, . . . ,x s 6 Pref Q (w). // a word u satisfies ua < p x\ ■ ■ ■ x s a, 
then there exists an integer < i < s such that u — x% ■ • ■ x%z for some z £ 

{U\, . . . ,Wind(> I + 1 )-l}- 

A more natural setting is to assume that each of x\, . . . , x s is either an 
element of Pref Q (u>) or an element of Suffa(w) because, by left hairpin com- 
pletion, the complement of a a-suffix of w can be produced to the left of w. 
We need to generalize the function ind by extending its domain as follows: for 
Xi G SufiV(u;), ind(iEi) = j if xi — Vj. Note that this generalized ind is not 
a function any more in cases when Pref a (w) Pi SufiV(w) ^ 0, but this will not 
cause any problem in this paper. 

1 X±X2 ■ ■ ■ Xs = x7 • • • %2 ~xl- 
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Lemma 2. Let x\, . . . ,x s £ Pref Q (u>) U Suffa(w). If a word u satisfies ua < p 
x\ ■ ■ ■ XfCX, then there exists an integer < i < s such that u = x% ■ ■ ■ XiZ, where 

\z G {m, . . • , Wind(x i+1 )-i} ifxi+i G Pref Q (w); 
[z G {vi,-..,«i t id(x 4+1 )-i} ifxi+i G Suffa(w). 

Proof. As done previously, we can find < i < s and a nonempty word z G S + 
satisfying u = x\ ■ ■ ■ XiZ and za < p Xi + \a. Since this prefix relation can be 
rewritten as axi+i > s az, if is an S-suffix of w, so is z. The case when 
Xi + \ G Pref Q (w) is clear from the previous argument. □ 

Having considered prefix relations among a-prefixes and S-suffixes of a word, 
now we proceed our study to more general factor relationships among them. 

Lemma 3. // Uja > s UiU for some integers 2 < i < j < m, then Uj G 

{ui,M 2 ,...,U J -l}'tii. 

Proof. We can let xuia = uja for some x G £*. Combining this with Propo- 
sition [21 we have xa < p Uja so that x G Pref Q (w). Since \x\ < \v,j\, x is in 

{Ul, U 2 , . . . , ttj-l}. □ 

Lemma 4. If v 2 a is a factor of U2O1, then u 2 = v 2 . 

Proof. Let u 2 a = xv 2 ay for some x, y G X*. Unless y — A, rri^O! <p would 
be a nonempty a-prefix of w that is properly shorter than u 2 , and causes a 
contradiction. Thus, y must be empty so that u 2 a = xv 2 ct. Now, Lemma [3] 
leads us to x = A. □ 

Finally, let us introduce interesting results that illustrate the close relation- 
ship between a-prefixes, commutativity, and primitivity, essential notions in 
combinatorics on words. 

Lemma 5. LetwectZ* andu EPref a (w). Then p(u), p(u) 2 , . . . , p^M/W")! <= 
Pref Q (w). 

Proof. Due to the first statement of Proposition | u e Pref a (w) enables us 
to let ay — ua for some y G S + . Its solution is well-known to be u — (st) n 
and a = (stys for some i > and s,t G S* such that p{u) = st. Hence, 
ua = (st) i+n s = plaits) 11 - 1 = p{u) 2 a{ts) n - 2 = ■■■ = p(u) n a. □ 

An immediate implication of this lemma is that the shortest nonempty a- 
prefix of a word that begins with a must be primitive. We should make one 
more step forward. Imagine that a word w has an a-prefix u. If w -^vh wu 
is possible, then w -^-rh wp{u) is also possible. Thus, repeating the extension 
of w to the right by p(u) \u\/\p(u)\ times amounts to extending w by u once. 
In other words, the process to extend a word by u is not essential unless u is 
primitiv e bec ause it can be always simulated by multiple processes to extend a 
word by p(u). 

The next lemma proves that all nonempty a-prefixes of length at most \a\ 
commute with each other, and hence, only the shortest one is essential in the 
above sense. 
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Lemma 6. For nonempty words x\,x% G S + , if a < p x\ct < p x^ol and \x2\ < 
\a\ hold, then p(x±) = p(xi). 

Proof. If |xi| = \x2\, then the prefix relation immediately gives x% = x%, and the 
conclusion of this lemma is trivial. Hence, we assume \x±\ < a^l- Combining 
\xi\ < \a\ with a < p x%a, we can deduce that the word Xia has a period |xi|. 
Likewise, xict has a period 22 1, and hence, x\a also has this period. As a result, 
x\a has two periods \x\\, a^l, and moreover it is of length at least the sum of 
these periods. Thus, Fine and Wilf 's theorem [4j [6] leads us to the conclusion 
of this lemma. □ 



3.1 Non-crossing words and their properties 

A word Wq G X* is an (m,n)-a-word, or simply an (m,n)-word when a is clear 
from the context, if |Pref Q (u;o)| = m and |Suffa(wo)| = n. Informally speaking, 
an (to, n)-word is a word on which a occurs m times and a does n times. For a 
pseudo-palindromic a (a — a), we regard an occurrence of a also as that of a, 
and as such, any word is an (to, TO,)-word for some m > 0. 

We say that wq is non-a- crossing if the rightmost occurrence of a precedes 
the leftmost one of a on wq. When a is understood from the context, we simply 
say that wq is non-crossing. Otherwise, the word is a-crossing or crossing. Note 
that if a = a, then for a word w which is either a (0,0)-word or (1, l)-word, 
H%(w) = {w}, and otherwise (w is an (TO,m)-word for some to > 2), w can 
be considered crossing. Thus, whenever the non-a-crossing word is concerned, 
we assume that a ^ a. The definition of a word being non-a-crossing does not 
force the word to begin with a or end with a. However, it is not until a is a 
primer that this notion becomes useful in our work. Thus, the word should be in 
either aE* or E*S. Actually, in the rest of this paper, we assume both of these 
conditions and consider only single-primer iterated hairpin completion] thus, we 
can assume that w G qE* n E*<x As let previously, elements of Prefo,(wo) are 
denoted by ui, . . . , u m , those of Suff^wo) by tJT, . . . ,v^, and they are sorted so 
that this assumption imposes u\ = v\ = A. 

Our main focus lies on the characterization of non-crossing words whose iter- 
ated hairpin completion is regular in terms of combinatorics on words. Thus, in 
this subsection, we prove some combinatorial properties of non-crossing words. 
Let us begin with an easy observation about the longest a-prefix and S-suffix 
of Wq- 

Proposition 3. u m = v n if and only if to = n and for all 1 < i < m, U{=v\. 

Next, we will see that one-step hairpin completion can extend wq to the left 
by any of Vi, . . . , v n -\ or to the right by any of Ui, . . . , u m _i due to the following 
lemma. 

Lemma 7. Let wq G a£* fl £*a be a non-crossing word with Prefo^wJo) = 
{ui, . . . ,u m } and Suffa(wo) = {vT, ■ ■ ■ ,v^}- Then |it m _i + \v n \ + 2\a\ < \w Q \. 
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Proof. Suppose that this inequality did not hold. Being non-crossing, wq can 
be written as wq — u m -\wu^ for some w £ aE* fl E*a with \w\ < 2\a\. Hence, 
w = w. Let a; be a nonempty word satisfying u m = u rn -ix. Since Wq is non- 
crossing, u m a < p u m -\w must hold, from which we have xa < p w. Combining 
this with w — w enables us to find an a-suffix x of wq, but this would be 
longer than the longest a-suffix of Wo, a contradiction. □ 

This lemma does not rule out the possibility that wq cannot be extended to 
the right by by hairpin completion because the rightmost occurrence of a 
might overlap with the suffix a. The analogous argument is valid for v n and 
left hairpin completion. However, Lemma [7] leads us to one important corollary 
on non-crossing (m, n)-words for to, n > 2 that hairpin completion can extend 
wo to the right by the complement of any of its a-prefix and to the left by the 
complement of any of its a-suffix. 

Corollary 2. Let wq £ aE* HE'S be a non-crossing (to, n)-word with to, n > 2. 
Then H a {wo) = {w } U {v 2 , ■ ■ ■ ,v m }wo U w {u2, ■ ■ .,u^}. 

Any word obtained from a non-crossing word by hairpin completion is non- 
crossing. Though being easily confirmed, this closure property forms the foun- 
dation of our discussions in this paper. 

Proposition 4. Let a £ S fc with a 7^ 57, and wo € aE* DS*a be a non-crossing 
word. Then any word in W a {wQ) is non-crossing. 

We conclude this section with a characterization of a non-a-crossing word 
in terms of minimal factors with respect to the language aE* R E*a. With 
Proposition 01 this characterization will bring a unique factorization theorem 
(Theorem [1} of any word w in %*(iuo) as w — xw^y for some words x,y. 

Lemma 8. Let a £ E fe with a j^a. A word wo £ aE* n E*a is non-crossing if 
and only if it contains exactly one minimal factor v from aE* n E*a. 

Proof. Let us consider the contrapositive of the converse implication. So, if wq 
is crossing, then we can find an occurrence of a (let us denote it by ao) which 
precedes an occurrence of a (ai). a"o is guaranteed to be preceded by another 
occurrence of a (02) because wq begins with a. Thus, the factor of wq that 
spans from a 2 to ao is a minimal factor from aE* n E*a. By the same token, 
the factor of wq that spans from cti to its right adjacent occurrence of a becomes 
another minimal factor. 

In order to prove the direct implication, suppose that wo contains two min- 
imal factors from aE* n E*a. These two factors must overlap with each other 
because otherwise the suffix a of the first factor precedes the prefix a of the 
second one and w would be crossing. However, if they overlap, then the over- 
lapped part would be in aE* fl E*a, and this contradicts the minimality of the 
two factors. □ 

Theorem 1. Let a £ E fc with a ^ a, and wq £ aE* n E*a be a non-crossing 
word. On any word in T-L^wo), wq occurs exactly once as a factor. 
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Proof. From the two facts that any word in H%(wq) is non-crossing (Proposi- 
tion [4]) and that these words contain at least one occurrence of wq as a factor 
by definition of hairpin completion, we can reach this conclusion. □ 

4 Iterated hairpin completion of non-crossing 
words 

This section contains the main contribution of this paper: characterizations 
of the regularity of iterated hairpin completion of a non-crossing (m, n)-word 
wq G aTt* n E*cE (recall that a ^ a is assumed). Throughout this section, wo is 
thus assumed with Pref a (u>o) = {tti, . . . , u m } and SufiV(u>o) = {yi, ■ ■ ■ , v^}. 

Let us begin with a proof that one-sided hairpin completion of a non-crossing 
word is regular (Theorem [2]). Then we will show that the iterated hairpin com- 
pletion of a non-crossing (m, l)-word for any m > 1 or (2, 2)-word is always reg- 
ular (Theorems |3]and|l]). Using these results and combinatorial results shown in 
Section |31 we characterize the set of all non-crossing (3, 2)-words whose iterated 
hairpin completion is regular, in terms of commutativity (Theorem [5]) . 

Theorem 2. For a non-crossing word Wq G qE* n £*a, both £%* (u>o) and 
1ZH,* a (wo) are regular. 

Proof. First, we prove the regularity of 1ZH* a {wo) . Let w be an a-prefix of wq. 
A right hairpin completion of wq can produce wqw. Note that the suffix aw 
of this resulting word does not contain a due to the non-crossing assumption 
on v, and this means that the longest a- prefix of wqw is the same as that of 
wq. Thus, the language 7ZH* a (wo) can be obtained by iterated bounded hairpin 
completion from v, and hence, is regular |14) . 

For the regularity of £H* a {wo), it suffices to observe that Wq is also non- 
crossing. Using the result just proved, 1ZH* a (wq) is regular, and according to 
Proposition [U CH* a (wo) = TZH* a (wo). Note that the class of regular languages 
is closed under ~. □ 

4.1 Iterated hairpin completion of (to, 1) non-crossing words 

In this subsection, we consider the case n = 1 (wq is an (m, l)-word), and prove 
that %*(u>o) is regular. For m = 1, it is easy to see that hairpin completion 
cannot generate any word but wq, that is, T-L* a {wo) = {u>o}. Hence, we assume 
m > 2. 

Lemma [7] means that right hairpin completion can extend wq to the right by 
any of u7, U2, ■ ■ ■ , u m -\, In contrast, the operation can extend wo to the right 
by if and only if \u m \ + 2\a\ + \vi\ > \wo\, i.e., the a to the right of u m does 
not overlap with the suffix a oi wq. As a result, if m = 2 but this inequality 
does not hold, then "H* (wq) — {wo}. Therefore, we can advance our discussion 
on the assumption that wq -^jch wqu~2 is valid. 

Note that wrfwi is a non-crossing (m, 2)-word. Applying Lemma [7] to this 
word, we can see that \u m \ + 2\a\ < \wqU2\- Hence, hairpin completion can 
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extend W0U2 further to the right by not only by any of u\, 1*2, . . . , u m -\ but also 

by Urn- 

Let us define the following regular language: 



We claim that this language is the language obtained from wq by iterated hairpin 
completion. 

First, we prove that %*(u>o) 2 Rmi(u>o)- Let w G R m i(wo). By defini- 
tion, any word in R m i(wo) can be factorized as w = x s ■ ■ • xiWqIJi y2'"Vt- 
Compare the leftmost factor x s and the complement of the rightmost factor 
yi with respect to their index. Assume that ind(x s ) < ind(t/t). Then w > s 
ayi >s olxI. Hence, one-step left hairpin completion can derive w from the word 
£ s _i • • • xiWqIJi - ••'yi. In the case when ind(a;i) > ind(yj), the same argument 
implies that w G lZH a (x s ■ ■ -xiwtyyi- ■ -y t -i). Due to maxi<j< s {ind(a;i)} < 
maxi<j<t{ind(yj)y, the repetition of this process eventually reduces wq into 
a word woy! ■••yj for some 1 < j < t. Because of the condition on y± and 
our discussion above, wq —>hh woyT —^-rk • ■ ■ ~>tjh woyi ■ ■ - yj is valid. Thus, 
weH* a (w ). 

Secondly, we prove the opposite inclusion by induction on the length of 
derivation by hairpin completion. Clearly wq € L(wq). Let us assume that a 
word in W a (wq) can be written as x s ■ ■ ■ XiWoyi ■ ■ -y t with max 1 < i < s {ind(a;j)} < 
max 1 < J < t {ind(y J )}. Let j = maxx<j<t{ind(y,)}. If left hairpin completion 
extends this word to the left by x, then ayT ■ ■ -yi > s ax and this means 
x G {ux, . . . , uj} + (see Lemma[T]). Thus, there exist x s >, . . . , G {u\, . . . ,uj} 
such that x = x s i ■■ ■ x s+ \ and max{ind(x S ' ), . . . , ind(x s+ i), ind(x s ), . . . , ind(a^i)} < 
j. It it trivial that this inequality remains valid in the right hairpin completion. 

Theorem 3. For any m > 1 and a non-crossing (m, 1) word wo G aE*a, the 
language "H* (wq) is regular. 

The key idea in the above discussion is that if a word in T-L* a (wq) begins with 
the longest a-prefix u m of Wq, then hairpin completion can extend it to the 
right by any of a-prefix of wq- This idea has a broader range of applications. 
Let Wq G a£* n E*cf be a non-crossing (to, n)-word for some to, n > 1 with 
Pref Q (u>o) = {ui, . . . , u m } and Suff^(wo) = {W, ■ ■ ■ ,Vn}- Proposition [3] says 
that if u m = v n , then Suff^(wo) = Pref*(iuo)- For to > 2, the rightmost 
occurrence of a on u;o does not overlap with the suffix a of wq (Lemma [7|) . 
Thus, W* a {wo) = {ui, . . . ,u m }*w {ui, ■ ■ ■ 

Corollary 3. Let wo G qE* Pi S*S be a non-crossing (m,n)-word. If u m = v n , 
then H.* a (wo) is regular. 



R m i{wo) = {w } U { 



x s ■ --XiWoyi 2/2 • • • 2/t 




{ui, . . .,u m -i,u m } if \u m \ + 2\a\ < \w \ 
{ui, . . . , u m _i} otherwise 



s > Q,t > l,x s , ...,X!,y 2 ,...,yt G {ui,...,u m }, 
and maxi<i< s {ind(a;j)} < maxi<j< t {hxd(yj)}}. 
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4.2 Iterated hairpin completion of (2, 2) non-crossing words 

In contrast to the result obtained in the previous subsection, Example Q] shows 
that there exists an (m, 2) non-crossing word whose iterated hairpin completion 
is non-regular with m — 3. This result motivates the study of (2, 2) non-crossing 
words reported here. Let wq G aS* fl S*a be a non-crossing (2, 2)-word. We 
can employ Corollary [2] to see that T-L a (wo) — {uio, V2W0, WQU2}. This further 
implies that the suffix a of any word in W a (wo) can bind with the second a on 
the (unique) factor wq on the word for right hairpin completion. 
Let us define the following regular language: 

RllL = V2(v2Wq)v2* U {V2 U 2 )* V2(v2Wq)v2* (U2 V2 + ) + ■ 

We will show that this language is exactly the set of words obtained by iterated 
hairpin completion from V2W0. 

In order to prove that H,* a (V2W0) 2 R22L, it suffices to present the following 
process: 

1>2Wo -^*tzh V 2 W V2~ JO 

~>-RH V2W Q V^°U^ V2~ 

^TZH V2W V2~ JO U2~ V^ 1 

-+"KH V2W v5 jo U2~ V^ h U2 «2 

~^im V2W V2 j °U2 V2 jl ■•■U2 W Jt_1 M2 W 

~>£H V % 2 V2WqV2 3 °U2 V2~ Jl ' ' 4 «2 V2 H '^U2 «2 

— >CH V2U2V 2 ° V2WoV2 J °U2 V? 1 ' "U2 V^ 3 ^ 1 ^ V2 

-^CH V% U2V 2 ° V2W V2~ j °U2 V2~ 31 ' ' ■M2^ 1 M2 V2 

~~Kc« V 2 U 2 ' " • vl 1 U2V2 V2WoV2 JO U2 V2 Jl ' ■ • U2 V2 H ~^U2 V2 

~^-RH V 2 U 1 ' " ' v 2 U2V2 'V2WqV2 Jo U2 V2 Jl ' • • «2 W Jt_1 U2 • 

Next, we prove the opposite inclusion by induction on the length of derivation 
by hairpin completion from v<iWq- Obviously, V2WQ C i?22L- Assume that all 
words obtained from V2WQ by at most n-times hairpin completion are in i?22L- 
Let w n be such a word and consider a word w n +i such that w n — >u w n+ i. 
Consider the case when this hairpin completion is right one. The rightmost 
occurrence of a on w n is the second a on its (unique) factor wq. Therefore, if 
we let w n+ i = w n x and then xa < p (Uj W 2)*W2«2W2. Since 112 and V2 are the 
respective shortest nonempty a-prefix and a-suffix of wq, Lemma [2] implies that 
x £ (v^ U2)* v%. Note that R22L is closed under catenating a word in {v2U2)*v 2 
to the right. Thus, w n +i € R.22L- The case when w n -^ch Wn+i can be proved 
in a symmetric manner. 

Due to the symmetry of 112 and V2, we can easily construct a regular language 
R22B. which is equivalent to H*(wqU2)- Now the regularity of %* a (wo) has been 
proved. 

Theorem 4. For a (2,2) non-crossing word wq € aS*a, the language "H*(wo) 
is regular. 
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4.3 Iterated hairpin completion of (3, 2) non-crossing words 

Theorem @] and Example Q] motivate our investigation of non-crossing (3, 2) 
words. Actually, Theorem [SJ a main contribution of this paper, provides a 
characterization of the regularity of iterated hairpin completion of a non-crossing 
(3, 2)-word in terms of the commutativity of the a-prehxes and S-sufhxes of the 
word. 

Let wo £ aE*n£*a be a non-crossing (3, 2)-word (so a ^ a) with Pref Q (u;o) = 
{A, 1/2,113} and SufiV(u>o) = {A, v^}. Note that u-i (v-i) must be primitive; oth- 
erwise, its primitive root is also an a-prefix (resp. a-suffix) of wo and wq would 
not be a (3, 2)-word any more. As a result, 112 commute with V2 (1*3) if and 
only if U2 = «2 (resp. U3 = Recall also that U3 ^ V2 must hold for wq to 
be (3,2)-word (Proposition [3]). Thus, if U3 and t>2 commute, then 113 = v\ and 
u 2 = V2- In other words, the commutativity between it 3 and V2 is reduced to 
the commutativity between u 2 and U3 and the commutativity between m 2 and 
V2, and hence, not essential. 

Corollary [5] states that W a {wo) = {wq} U {^2^0, wqU2, wqu^\. Let us ask 
the question of whether iterated hairpin completion can generate a same word 
from W0U2 and wqII^. We partially answer this question in a broader setting for 
arbitrary m > 3 and n > 1. 

Lemma 9. Let wo £ aS" n S*a be a non-crossing (m, n)-word for some m > 3 
and n > 1 wii/i Pref Q (t«o) = {ui, . . . ,u m }. For integers i,j with I < i < j, 
if Uj £ {u2, ■ ■ ■ , Uj-i}m, then H* a {wiyaj) C H*(w mI); otherwise, T~L* a (wouj) D 
E*w u7S* = 0. 

Proof. Let Uj = xit^ for some x £ {1/2, . . . , Uj-i}. Lemma [7] implies that 
WoUi — )-tjh wo"i x = wquJ is possible. Thus, the inclusion holds. Conversely, if 
the intersection is not empty, then Theorem Q] implies that a uj = a ul y for 
some y £ E + . Then, due to Lemma El this equation gives y £ {112, ■ ■ ■ , ttj-i}; 
thus, Uj £ {u2, • ■ • , Uj-i}ui. □ 

We can employ Lemma [HI in our current setting of non-crossing (3, 2)-words 
to observe that if M3 = u\, then H^wqTl^) C %* a {wqVv)', otherwise, W a (woUzs) H 
S*woM2"S* = 0. Thus, for example, if u 3 ^ u|, then W a (w u^) n "H* (w U2 ) = 0- 

In this subsection, we hrst prove that the commutativity of U2 with V2 or 
with M3 is a sufficient condition for ~H* a {wo) to be regular. 

Lemma 10. If U2 = U2> then the language (u)q) is regular. 

Proof. Let wq — WV2 for some w £ a£* n Observe that w is a non- 

crossing (3, l)-word with 1/2,1*3 being its nonempty a-prefix. Lemma [7] implies 
that U2I + 2\a\ < \w\, which means that hairpin completion can extend w to 
the right by U2 and result in If | U3 1 + 2 1 a \ < \ w | , then hairpin completion 
can also generate wSJ, but it is not essential in the following discussion whether 
this is possible or not. Let us consider only the case when it is possible. Then 
T-L* a {w), which is regular due to Theorem El is {w} U (uwi) U ri* a (umjj). As 
we have seen above, if 11M3 £ 'H Q ,(w), then either £*wTt2"£* H W* a {wu?,) = or 
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%*(wu 2 ) 2 U* a {wuz). In any case, V-* a {w ) = H%(w) n£*KW 2 £*, and hence, is 
regular. □ 

Now it is easy to see that "H*(wo) is regular when 1*3 commutes with t> 2 . 
Since wq is (3, 2)-word, v-i must be primitive and 1*3 is equal to either v 2 or v\. 
In the former case, w 2 is a proper prefix of v 2 so that wq has U2 and would not 
be a (3, 2)-word. Thus, the latter must be the case. In this case, the prefix v 2 
of U3, which is the primitive root of 113, is an a-prefix of wo (Lemma [5]), and 
hence, in order for wq to be a (3, 2)-word, u 2 — V2 must hold, and this brings 
the conclusion according to Lemma [TOl 

Lemma 11. If u 3 = u\, then the language (wq) is regular. 

Proof. Lemma [10] makes it sufficient to consider the case when U2 does not 
commute with V2- Since (u>o) = {^o} U H^^wq) U "H* (^0^2) U 1L* a (u>o«2 2 ) 
(when the reader check this, recall Lemma [7]) and T-L* a (W0U2) 3 Ua( w oU2 2 ), we 
will show the regularity of the second and third terms of this equation and that 
is enough for our purpose. 

First, we prove that %* a (100^2) is regular. Let u>o = U2W, where w £ aS* f~l 
E*a is a (2,2)-word with Pref Q (w) = {A,w 2 } and SufiV(w) = {A,U2~}. We can 
easily check that 

7~L* a {w) = {W, WV^,Unpf} U %* a (U2WU^) U %* a (u 2 V2WU2) U Ha(v2w). 

As done in the proof of Lemma [101 the non-commutativity between U2 and 
v 2 implies that (H^{v, 2 v 2 vm) U H* a (v 2 w)) n £*u 2 w£* = 0. Thus, n 
E*m 2 uiS* = %a{wou~2). Since w is a non-crossing (2,2)-word so that 'H* a {w) is 
regular (Theorem S]) , and hence, so is T-L* a (1V0U2). 

Next, we prove the regularity of "H* (v 2 ?«o)- We can let wq — w'v2 for 
some (3, l)-word w'. This means that V2U1' is a (4, l)-word with Pref a (u2«/) = 
{A, v 2 , V2U2, V2U2} and the empty a-suffix. Thus, 

H* a (v 2 w') = {v 2 w'}un*(v 2 w%)uni{v2w'u^l^)uni(v2w'w 2 v^). 

Using the essentially same argument as above, we obtain "H* (v2w')nT,* V2w'v2~Y,* = 
H* a {v2U)o). Since the iterated hairpin completion of non-crossing (4, l)-word is 
regular (Theorem , %* a (v 2 w') is regular and so is %* (v 2 u>o)- 

Combining what have been proved in the previous two paragraphs together, 
we conclude the regularity of H^(wo). □ 

To summarize the results obtained so far, any of two of the a-prefixes and 
the complements of a-suffixes of wq, i.e., u 2 , U3, v 2 , must not commute in order 
for Ha(wo) not to be regular. 

Lemma 12. If = u 2 v 2 , then the language Ha( w o) * s regular. 

Proof. Due to Lemma [TOl it suffices to consider this problem under the assump- 
tion U2 ^ V2, which is equivalent to that w 2 does not commute with v 2 under 
our problem setting. 
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We have U* a (w ) = {w } U H* a (v 2 w ) U H* a (w u 2 ) U n* a (w v 2 u 2 ). As done 
before, we will check that the second, third, and fourth terms of the union above 
are regular. The regularity of the third one is from Pref a (wqu^ ) — {A, u 2 , u 2 v 2 } 
and Suff 0(1^0^2) = {A,M2, v% uil and Corollary [3] 

In order to check that the second term is regular, let wq = W1V2, where w\ 
is a (3, l)-word. Then v 2 wi is a (4, l)-word, and 

%* a (V2W\) = {v 2 W\} U %a(v2W\V2) U H* a (U2W1U2 V2) U H%(v2WiV2 wi Ujj). 

Since v 2 w\v 2 ->-rh v 2 w\vi ui V2 and H^(v 2 wiTi^ W[) H S*u 2 i«i^S* = 0, we 
have T-L* a {v 2 wo) = 'H a (v 2 wiv 2 ) = T-L^(v 2 wi) fl XTi^wii^S*. The regularity of 
H%(v2Wx) is due to Theorem[3]so that H a (v 2 wo) is regular. 

What remains to be considered is the fourth term. One can let wqv^ U2 = 
u 2 v 2 w 2 for some non-crossing (1,4)- word 102- ThenW* (u^) = {w2}U"H* (u 2 w 2 )U 
Ha( u 2 v 2 w 2) U 'K-a( u '2 v 2 ' w 2) holds, and we can easily see that W a {u 2 v 2 w 2 ) = 
Ha{ w 2) (~l T,*u 2 v 2 w 2 ^*. The regularity of %* a (wqv^ Hq) was proved. □ 

Theorem 5. Letwo £ aY,*nY,*a be a non-crossing (3, 2)-word withPref a (wo) = 
{A,M2,U3} and SuSa(wo) — {X,!^} . Then %*(u>o) is regular if and only if one 
of the following three conditions holds: 

1. u 2 commutes with v 2 ; 

2. u 2 commutes with u^; 

3. U3 = u 2 v 2 . 

Proof. Let R = U3U^ 2 V2WoM2- 2 M3, which is a regular language. Under the as- 
sumption that none of the conditions 1-3 holds, L := H^(wo)riR = {u 3 u 2 v 2 wqu 2 1 U3, \ 
i > 2} holds. As mentioned previously, if the second condition does not hold, 
which is equivalent to 7/3 ^ u 2 , then HC^{wqUs) cannot contain any word in 
the above intersection. Thus, L = {%* a {w u^) n R) U CH^(v 2 w ) n R). Using 
Lemmas [3] and 21 we can easily prove the emptiness of the second intersection 
of the above sum. This check is left to the reader, and the authors recommend 
them to check at least %* a {v 2 WQU^ V2) fl R = because this check involves the 
important fact that a < p a H3 implies U3 = u 2 and causes a contradiction. As 
a result, we have L — %^ x {wqu 2 ) fl R. Informally speaking, in order to produce 
a word in R from wq, we first have to extend wo to the right by U2- 

Now we can extend wou^ to the right by i-times to obtain wqu^ 1 . If this 
obtained word is extended to the left, then the word will be in U2E*woE*u 2 . 
Let us check that u 2 T,*wo'S*Tl 2 n U3S* wqY.*!!^ — 0. If the intersection is not 
empty, then u^a < p u 2 xa for some x £ {u 2 ,u^,v 2 } + . Due to Lemma [5J U3 £ 
u 2 {u 2 , v 2 } + , but actually we can say W3 £ u 2 {u 2 , v 2 } for U3 is the second shortest 
nonempty a-prefix of wq. However, this means that either the condition 1 or 2 
holds, and contradicts our assumption. Thus, we have only one choice; extending 
WqU"? to the right by U3. 

As mentioned above, a 7*2 < p a 7Z3 cannot hold so that we cannot extend 
wqU2 1 U3 further to the right to obtain a word in R. Thus, we should extend this 
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word to the left cither by u^u? 2 for some j < i or by U3U 2 V2- Lemmas [3] and HI 
prove that the former choice will not lead us to any word in R. Now it suffices 
to mention that extending further to the left because such an 

extension force the contradictory relation a U2 < p a U3 to hold. □ 

5 Conclusion 

In this paper, we focused on finding conditions that a word Wo £ aS* PI S*a 
must satisfy so that its iterated hairpin completion "H* (wq) is a regular language. 
We classified the set of all non-crossing words according to the number to of 
occurrences of a and the number n of occurrences of 5 on a given word. For 
the cases when n = 1 and when m = n = 2, we proved that the iterated hairpin 
completion of a non-crossing (to, n)-word is regular. We also found a necessary 
and sufficient condition under which the iterated hairpin completion of a non- 
crossing (3, 2)-word is regular. This approach can be generalized to arbitrary 
non-crossing (TO,n)-words, with the cases (m, 1) and (2,2) being the induction 
base of an inductive proof. Future works include considering the same problem 
for crossing- words. In this case, Lemma[7]or Theorem[T]does not hold any more, 
and hence, it may get harder to analyze the derivation processes of how a word 
is obtained from a given word wq by iterated hairpin completion. In addition, 
we investigated only the case when the suffix of length k of an initial word wq 
is the complement of its prefix of the same length, but we eventually have to 
consider wq in aS* D £*/?, where f3 might not be equal to a (double-primer 
hairpin completion). We can easily observe that one-step hairpin completion 
with respect to a ((3) derives a word in n S*/3 (resp. aS* n S*a) from 
Wo. Thus, results obtained in this study of single-primer hairpin completion are 
important step towards this most general setting of the regularity test problem 
of iterated hairpin completion of a single word. Another direction of research is 
to consider stopper sequences as in Whiplash PCR [7| [20] . 
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